Parallel LINQ Queries (PLINQ)

To wrap up your look at the TPL, be aware that there is another way you can incorporate parallel tasks into your .NET applications. If you wish, you can make use of a new set of extension methods, which allow you to construct a LINQ query that will perform its workload in parallel (if possible). Fittingly, LINQ queries that are designed to run in parallel are termed PLINQ queries.

Like parallel code authored using the Parallel class, PLINQ has the option of ignoring your request to process the collection in parallel if need be. The PLINQ framework has been optimized in numerous ways, which includes determining if a query would in fact perform faster in a synchronous manner.

At run time, PLINQ analyzes the overall structure of the query and if the query is likely to benefit from parallelization, it will run concurrently. However, if parallelizing a query would hurt performance PLINQ just runs the query sequentially. If PLINQ has a choice between a potentially expensive parallel algorithm or an inexpensive sequential algorithm, it chooses the sequential algorithm by default.

The necessary extension methods are found within the ParallelEnumerable class of the System.Linq namespace. Table 19-5 documents some useful PLINQ extensions.

Table 19-4. Select members of the ParallelEnumerable class

Member Meaning in Life
AsParallel() Specifies that the rest of the query should be parallelized, if possible.
WithCancellation() Specifies that PLINQ should periodically monitor the state of the provided cancellation token and cancel execution if it is requested.
WithDegreeOfParallelism() Specifies the maximum number of processors that PLINQ should use to parallelize the query.
ForAll() Enables results to be processed in parallel without first merging back to the consumer thread, as would be the case when enumerating a LINQ result using the foreach keyword.

To see PLINQ in action, create a final Windows Forms application named PLINQDataProcessingWithCancellation and import the System.Threading and System.Threading.Tasks namespaces. This simple form will only need two Button controls named btnExecute and btnCancel. Then the “Execute” button is clicked, you will fire off a new Task which executes a LINQ query that investigates a very large array of integers, looking for only the items x % 3 == 0 is true. Here is a nonparallel version of the query:

public partial class MainForm : Form
{
...
    private void btnExecute_Click(object sender, EventArgs e)
    {
        // Start a new "task" to process the ints.
        Task.Factory.StartNew(() =>
        {
            ProcessIntData();
        });
    }

    private void ProcessIntData()
    {
        // Get a very large array of integers.
        int[] source = Enumerable.Range(1, 10000000).ToArray();

        // Find the numbers where num % 3 == 0 is true, returned
        // in descending order.
        int[] modThreeIsZero = (from num in source where num % 3 == 0
            orderby num descending select num).ToArray();
    
        MessageBox.Show(string.Format("Found {0} numbers that match query!",
            modThreeIsZero.Count()));
    }
}

Opting in to a PLINQ Query

If you wish to inform the TPL to execute this query in parallel (if possible), you will want to make use of the AsParallel() extension method as so:

int[] modThreeIsZero = (from num in source.AsParallel() where num % 3 == 0
    orderby num descending select num).ToArray();

Notice how the overall format of the LINQ query is identical to what you have seen in previous chapters. However, by including a call to AsParallel(), the TPL will attempt to pass the workload off to an available CPU.

Canceling a PLINQ Query

It is also possible to use a CancellationTokenSource object to inform a PLINQ query to stop processing under the correct conditions (typically due to user intervention). Declare a form level CancellationTokenSource object named cancelToken and implement the Click handler of the btnCancel to call the Cancel() method on this object. Here is the relevant code update:

public partial class MainForm : Form
{
    private CancellationTokenSource cancelToken = new CancellationTokenSource();
    private void btnCancel_Click(object sender, EventArgs e)
    {
        cancelToken.Cancel();
    }
...
}

Now, inform the PLINQ query that it should be on the lookout for an incoming cancellation request by chaining on the WithCancellation() extension method, and passing in the token. In addition, you will want to wrap this PLINQ query in a proper try/catch scope, and deal with the possible exception. Here is the final version of the ProcessIntData() method.

private void ProcessIntData()
{
    // Get a very large array of integers.
    int[] source = Enumerable.Range(1, 10000000).ToArray();

    // Find the numbers where num % 3 == 0 is true, returned
    // in descending order.
    int[] modThreeIsZero = null;
    try
    {
        modThreeIsZero = (from num in
            source.AsParallel().WithCancellation(cancelToken.Token)
            where num % 3 == 0 orderby num descending
            select num).ToArray();
    }
    catch (OperationCanceledException ex)
    {
        this.Text = ex.Message;
    }

    MessageBox.Show(string.Format("Found {0} numbers that match query!",
        modThreeIsZero.Count()));
}

That wraps up the introductory look at the Task Parallel Library and PLINQ. As mentioned, these new .NET 4.0 APIs are quickly becoming the preferred manner to work with multithreaded applications. However, effective use of TPL and PLINQ demand a solid understanding of multithreaded concepts and primitives, which you have been exposed to over the course of this chapter.

If you find these topics interesting, be sure to look up the topic “Parallel Programming in the .NET Framework” in the .NET Framework 4.0 SDK documentation. Here you will find a good number of sample projects which will extend what you have seen here.

Source Code The PLINQDataProcessingWithCancellation project is included under the Chapter 19 subdirectory.